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Random Walks, Markov Processes and the 
Multiscale Modular Organization of Complex 

Networks 

Renaud Lambiotte, Jean-Charles Delvenne and Mauricio Barahona 


Abstract —Most methods proposed to uncover communities in complex networks rely on combinatorial graph properties. Usually 
an edge-counting quality function, such as modularity, is optimized over all partitions of the graph compared against a null random 
graph model. Here we introduce a systematic dynamical framework to design and analyze a wide variety of quality functions for 
community detection. The quality of a partition is measured by its Markov Stability, a time-parametrized function defined in terms of 
the statistical properties of a Markov process taking place on the graph. The Markov process provides a dynamical sweeping across 
all scales in the graph, and the time scale is an intrinsic parameter that uncovers communities at different resolutions. This dynamic- 
based community detection leads to a compound optimization, which favours communities of comparable centrality (as defined by 
the stationary distribution), and provides a unifying framework for spectral algorithms, as well as different heuristics for community 
detection, including versions of modularity and Potts model. Our dynamic framework creates a systematic link between different 
stochastic dynamics and their corresponding notions of optimal communities under distinct (node and edge) centralities. We show 
that the Markov Stability can be computed efficiently to find multi-scale community structure in large networks. 
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1 Introduction 

OW the Structure of a network affects the dynamics 
(e.g., diffusion or synchronization) that takes place on 
it has been studied extensively in recent years IB, |2l, IB. 
This relationship is particularly relevant when the network 
is composed of tightly-knit modules or communities i), ID, 
[la, 0, la, 0, which can lead, for instance, to partially 
coherent dynamics cni, im, or to the emergence of co¬ 
operation ca and coexistence of heterogeneous ideas in a 
social network ca. Conversely, it has been proposed that dy¬ 
namical processes such as random walks uni, ca, ca, ca 
and synchronization cni could be used as empirical means 
to extract information about the network and, specifically, to 
uncover its community structure. 

Recently, there has been extensive research on the detection 
of communities and hierarchies in real world systems, ranging 
from social systems to technological and bio-chemical systems 
(for a review see 0). Most of these studies follow from the 
classical problem of graph partitioning and are thus based on 
structural properties of graphs 0, 0- In order to discover 
communities, such methods usually proceed by optimizing a 
quantity that captures what is thought to be the goodness of a 
partition in terms of combinatorial properties of the graph. A 
variety of such quality functions (and associated optimization 
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strategies) have been proposed, including different versions of 
balanced and normalized cuts, as well as modularity and its 
extensions 0, 0 . In general, these combinatorial definitions 
operate by counting the number of links within and between 
the communities, and are thus blind to the flows of information 
taking place on the network. 

In contrast, we adopt here a dynamical viewpoint for the 
analysis of community structure in graphs. Specifically, we 
use statistical properties of a random walk (or its associated 
Markov processes) evolving on a given network to quantify 
the quality of partitions across all time scales. Consider, for 
instance, the simple random walk, where a random walker 
jumps at every step from the node where it sits to one of its 
immediate neighbours with a probability proportional to the 
weight of the link joining the nodes. We define the Markov 
Stability IT4l . IT^ . 1X91 . l2Ql of a partition of the graph 
at time t as the probability of a walker to be in the same 
community at time zero and at time t when the system is at 
stationarity, discounting the expected probability as t ^ oo. 
For an ergodic and mixing random walk (i.e., on an aperiodic, 
strongly connected graph), this limiting probability is the 
probability of two independent walkers to be in the same 
community. The Markov Stability so defined measures the 
quality of a partition in terms of the persistence of the Markov 
dynamics within the communities of the partition within the 
time scale t, i.e., the Markov Stability is large when it is 
unlikely that a random walker will escape the communities 
within time t. Alternatively, the Markov stability can also be 
understood as the time auto-correlation of a coarse-grained 
signal. Hence, a large Markov Stability is equivalent to a non- 
asymptotic time scale separation ED, EH within the diffusion 
dynamics, where the fast dynamics mixes the probability fiow 
inside the communities and the slow dynamics describes the 
transfer of probability between the communities. It can be 
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shown that the Markov Stability so defined, which we will 
make more explicit below, is monotonically decreasing for 
most partitions on most graphs ifT^ . 

The dynamics-based Markov Stability framework for com¬ 
munity detection introduced in Ga, CD, US) has mathe¬ 
matical connections with the wider literature relating random 
walks on graphs and graph properties and allows us to link 
those results with applications in community detection. A 
strong initial motivation for our work was the theory of quasi¬ 
stationary distributions in Markov chains 1^ . 1241 . and the 
theory of quasi-stable (long-lived) states in the physics of 
energy landscapes 1^ . Random walks have been used by a 
variety of methods in graph partitioning and clustering. For 
example, the mixing rate of the random walker is closely 
related to the conductance, a measure of quality for two-way 
partitions (261, (23, (211 • Through their commute times (^ 
or through more general spectral embeddings (30l . random 
walks also allow representations of the graph in a Euclidean 
space on which classic machine learning techniques can be 
used, including clustering. Other partitioning algorithms have 
also made use of random walk measures ifTsl . ED, E2), 
ES The distinguishing feature of the Markov Stability ap¬ 
proach is the systematic sweeping through all time scales, 
fast to slow, in order to discover fine or coarse partitions, 
thus relating characteristic time scales of the dynamics to 
the structural scales present in the network. In constrast, the 
precited methods focus on a fixed time scale (e.g., one-step) 
or a fixed number of communities (e.g., two) and hence 
do not exploit fully the dynamical aspects of the random 
walk. See for a more extensive discussion, and Section 

for an overview of the unifying character of the Markov 
Stability framework, whose dynamical character allows the 
interpolation between the structural (edge-counting) measures 
and the spectral approach to community detection. 

In this article, we extend the Markov Stability formal¬ 
ism and show that any random walk on a given network, 
whether in discrete or continuous time, generates a different 
partition Stability function, and therefore a different notion 
of community reliant on specific measures of node and/or 
edge centrality. Indeed, classical notions of centrality (e.g., 
degree, eigencentrality, pagerank) can be shown to correspond 
to different random walks on the networks. Within this frame¬ 
work, we observe that good communities appear as a result 
of an optimization that balances the cost of severing many or 
highly central edges against a maximum-entropy spread of the 
centrality across communities. This compound optimization is 
parametrically modulated by time, which gives varying weight 
to the energetic cost of the cut against the maximum entropy 
term. At long times, the problem turns out to be solved 
exactly by spectral methods. We show how these dynamical, 
graph-theoretical and optimization concepts are intertwined, 
providing insight on the nature of different community struc¬ 
tures, the centrality optimizations they entail, and associated 
spectral partitioning algorithms known in the literature. Our 
work thus provides a unifying viewpoint for different variants 
and heuristics used in the graph-partitioning, clustering and 
community detection literatures, including several variants of 
null-model-based modularity or spectral algorithms, which 


appear as particular cases of our formalism. Conceptually, our 
work indicates that, rather than searching for a single partition 
at a particular scale, dynamics can be used to unfold and detect 
systematically the relevant partitions by scanning across all 
scales in the graph oa, csi. Similarly, we show here that 
the choice of dynamics can also be used to find the most 
appropriate community structure (if particular information 
about the system is available) or to explore the network 
under different (and complementary) viewpoints to gain deeper 
information about the system. 

The paper is organized as follows. First, the framework is 
introduced via the standard (simple) random walk and its as¬ 
sociated continuous-time processes, including those generated 
by the normalized and combinatorial Laplacians. We show 
how the relevant centrality measure in this case is the degree, 
yet different continuous-time Markov processes (potentially 
relevant for different network dynamics) lead to different 
communities linked to particular heuristic null models used in 
the community detection literature. The dynamical scanning 
implicit in our framework is used to illustrate the detection of 
community structure across scales in several examples without 
imposing the scale or number of communities a priori. Part of 
these results were reported in the unpublished preprint (T^ . 
We then consider the analysis of less standard random walks, 
specifically the Ruelle-Bowen case, and show that its notion 
of community is based on a different kind of centrality, i.e., 
eigencentrality. This is followed by a brief section where 
we show how the dynamical viewpoint afforded by Markov 
Stability seamlessly extends to the case of directed graphs, thus 
allowing us to recast the concept of structural communities 
in terms of fiow communities. The final section illustrates the 
framework with the analysis of synthetic benchmarks and real- 
world examples, and discusses computational and practical 
issues for Markov Stability, e.g. assessing the presence of 
robust partitions, or of a hierarchical structure. 

2 The simple random walk and com¬ 
munity detection: Discrete-time Markov 
Stability for undirected graphs 

To make our arguments more precise, we first review briefiy 
some of the notation and results from O, CD, where 
mathematical proofs and further results can be found. For 
simplicity, we start by considering the case of undirected 
graphs, although we will see below that the arguments extend 
to directed graphs too. 

Consider an undirected graph with N nodes and weighted 
adjacency matrix A e such that the weight of the 

link between node i and node j is given by Aij = Aji. The 
vector containing the degrees (or strengths) of the nodes is 
d = Al, where 1 is the N x 1 vector of ones, and we 
also define the diagonal matrix D = dmg{d). The sum of all 
degrees is 2m = 1^d. The combinatorial graph Laplacian is 
defined SiS L = D — A and the normalized graph Laplacian 
is defined as £ = . Both Laplacians are 

symmetric nonnegative definite, with a simple zero eigenvalue 
when the graph is connected (3^ . We denote the trace with 
the notation Tr[ ]. 
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Consider the simple (unbiased) random walk governed by 
the standard dynamics: 

Pt+i = Pt = PtM, (1) 

where p denotes the 1 x dimensional probability vector and 
M is the transition matrix. Note that following the Markov 
chain literature, the probability vectors are defined as row 
vectors. Under the assumptions of a connected, undirected, 
and non-bipartite graph this dynamics converge to a unique 
stationary distribution 


TT = cF/2m. 


( 2 ) 


Each partition of the graph into c communities is encoded 
by a X c indicator matrix H with Hij G {0,1}, where 
a 1 denotes that node i belongs to community j. Given 
a partition H, the clustered auto covariance matrix of the 
diffusion process at time t is: 

Rt{H) = [UM* - H, (3) 

where 11 = diag(7r). The c x c matrix R{t) reflects the 
probability of the random walk to remain within each block 
(diagonal elements) and to transfer between blocks (off di¬ 
agonal elements) after a time t. Consequently, we define the 
Markov Stability of the partition H as 

rt{H) = min TV TV [Rt{H)] , (4) 

0<s<t 

the approximation coming from the computational obser¬ 
vation that TT[Rt{H)] is mostly monotonically decreasing 
for empirical graphs 1^ . A ‘good’ partition over a time 
scale t has well-defined communities that preserve probability 
fiows within them, hence maximizing the trace of Rt and, 
conversely, the Markov Stability rt{H) can be seen as a quality 
function for a partition of a graph as a function of the time 
horizon of the random walk. 

The Markov Stability rt{H) can be used to rank partitions 
of a given graph at different time scales or, alternatively, rt{H) 
can be used as an objective function to be maximized for every 
time t in the space of all possible partitions of the graph: 


rt = max rt(i^). 


(5) 


Such an optimization results in a sequence of partitions 
optimal over different time interval. Although this optimization 
is NP-hard, a variety of efficient optimization heuristics for 
graph clustering can be used, as discussed in later sections. 

The discrete-time Markov Stability rt{H) for undirected 
graphs encompasses several well-known heuristics and has 
other desirable theoretical properties, some of which we high¬ 
light here succinctly (see ifT^ . |[T^ for proofs): 

• Discrete-time Markov Stability at time t = 1 is equal to 
the ‘usual’ modularity Qconf, i-e., with the configuration 
model as null model Oil, El: 


ri{H) = Tr 


tT t A 




— Qconi • (b) 



Fig. 1. Unfolding the multiscale community structure of 
a hierarchical network as a function of Markov time. 
As an illustration, consider a hierarchical graph generated as 
follows (36]: start with a pair of nodes connected by a link of 
weight c < 1, duplicate them and add a link of weight between 
all pairs of nodes in different modules. Iterate the procedure K 
times to obtain a fully connected, weighted network of 2^ nodes. 
The figure shows a network with 2^ = 16 nodes with edges 
shaded according to their strength (c = 1/4). By symmetry, 
the natural partitions are into 16 single nodes, 8 pairs (colours), 
4 tetrads (shapes) and 2 groups of 8 nodes (upper and lower 
hemispheres). Evaluation of the Markov Stability rnorm(t) shows 
that, as t grows, the optimal partition goes from 16 communities 
to 8 to 4 to 2 over different time intervals. 


• Markov Stability at time t = 0 is equivalent to the Gini- 
Simpson diversity index of the partition H (381: 


ro{H) = l-Y,{^hcf = (7) 

C=1 


where he is the C-th column of the matrix H. is 
a measure of entropy of the partition according to the 
values of tt, i.e., the degree. GSt^ is large when the 
partition has many communities of equal size (according 
to tt), and is low when the partition has few and uneven 
communities. GS^^ is maximum for the partition into 
one-node communities. This index is well known in 
economics (Hirschman-Herfindahl index (391 ) and infor¬ 
mation theory (Renyi entropy (40l ). among others. 

• The probability of changing community in one step 


ro{H)-rFH) = l-TT 


' rj. A 
—H 
2m 


= Cut, 


( 8 ) 


is a measure of the cut induced by the partition, i.e., the 
fraction of edges between all the communities. 

• The long-term behavior of rt is governed by the nor¬ 
malized Fiedler eigenvector associated with the second 
dominant eigenvalue of M, i.e., that which is closest 
to 1 in absolute value. Hence the optimal community 
structure as t ^ oc is typically given by the bipartition 


1. Close-to-bipartite graphs are the exception: they have a strongly negative 
eigenvalue whose odd and even powers generate an alternating rt. 


























4 


according to the sign of the entries of the normalized 
Fiedler eigenvector 03, CD- 
• Spectral algorithms (either iterative or based on several 
eigenvectors at a time) are classic relaxation heuris¬ 
tics BTI . ll42l for the optimization of a variety of NP- 
hard partitioning quality functions, including modular- 
ity m or normalized cut l44l . We have shown that 
spectral clustering methods provide exact procedures for 
the optimization of Markov Stability at long times. 


3 Continuous-time Markov Stability: 

THE DYNAMICAL ORIGIN OF DIFFERENT QUAL¬ 
ITY FUNCTIONS 

We now consider continuous-time Markov processes associ¬ 
ated with the simple random walk O in order to extend 
our dynamics-based framework for community detection in 
undirected graphs. 


3.1 Normalized Laplacian Markov Stability 

Given the random walk Q on an undirected graph, a standard 
way to derive a continuous-time model is to assign a continu¬ 
ous Poisson process of given density at each node Ea, Ea. If 
we assume identically distributed Poisson processes (i.e., with 
identical waiting times) for all nodes, we obtain the standard 
diffusive dynamics: 

^ = _p [7 _ d-^A] = -p [D-^L] (9) 

Note that the operator D~^Li^ isospectral with the normalized 
Laplacian C since they are related by the similarity transfor¬ 
mation = D~^L. Hence the dynamics of 0 is 

dictated by the spectral properties of £. In particular, this pro¬ 
cess converges to the same unique stationary distribution ^ 
as the (discrete-time) simple random walk. As above, we thus 
define the continuous-time Markov Stability as: 


=Tr (Oe 


„-tD- 


— TT TT 


H 


( 10 ) 


where the notation rnorm emphasizes the connection with 
the normalized Laplacian. This continuous-time version of 
Markov Stability shares broadly similar properties with the 
discrete-time version and most of the discussion presented 
in Section applies here. For instance. Figure shows the 
results of the optimization of rnorm (^; H) over time and over 
the space of partitions for a simple example. Note that the 
Markov Stability explores the community structure at all 
scales (from finer to coarser) using the dynamic zooming 
provided by the Markov time of the diffusion process t. 
The relevant (time) scales emerge as the ones leading to 
persistent (robust) partitions over extended intervals of time. 
See Section!^ and Refs. (191, (^ for a discussion of some of 
the practical issues of the computational implementation and 
more illustrative examples. 

It is also instructive to consider the behavior of in the 
limit of small times, t ^ 0. Keeping terms to first order, we 


obtain the linearized Markov stability: 

^norm(^; = TnormiO; II) - t Tr 

= GSt^ — t Cut (11) 

= (1 — t) GSt^ + t Qconf (12) 

where we have used ([^-([8]) and the fact that Tr [H^LH] = 
2m—Tr AH ]. A few remarks about the linearized Markov 
Stability follow: 

• Analogously to the instantaneous probability rate 
of the walker escaping from its initial community 
— dcnormit; H)/dt\^^Q = Tt[H^LH]/ 2m is the Cut. 

• The Potts model heuristic proposed by Reichardt & 
Bornholdt Ea is exactly recovered as the linearized 
Markov stability. Hence we can see the Markov time t as 
the equivalent of a resolution parameter. From (V2) it also 
follows that the ‘usual’ modularity Ea, El is recovered 
at t = 1 for undirected graphs: 

r“”^(l;if) = Qeonf. (13) 

• Equation ( pTj ) provides an interpretation of Markov Sta¬ 
bility as a compound quality function to be optimized 
under two competing objectives: minimize the Cut size 
while trying to maximize the diversity GSt^, which 
favours a large number of equally-sized communities 
according to tt, thus resulting in more balanced partitions. 
The relative weight between both objectives is modulated 
as the Markov time t increases. 

The stationary distribution tt plays a key role in the defini¬ 
tion of the community quality function: 

• Firstly, tt can be understood as originating the null model 
of modularity, i.e., the model of random graph against 
which the network is compared to detect the signifi¬ 
cance of the communities. The null model in the ‘usual’ 
modularity is the configuration model, which randomly 
rewires the edges of a given graph preserving the degree 
of every node. The probabilistic description of this model 
is given by the outer product tt^tt, which in our dynam¬ 
ical interpretation corresponds to the expected transfer 
probabilities at stationarity for this Markov process. 

• Secondly, CS^^ measures the diversity of the partitions 
according to the node property tt. Hence, as the value of 
t grows, the optimization leads to balanced distributions 
of TT across communities, splitting nodes with high values 
of TTi into different communities. In this case, we tend to 
segregate nodes with high degree into different groups. 

3.2 Combinatorial Laplacian Markov Stabiiity 

Given a discrete-time random walk, a variety of continuous¬ 
time Markov processes are possible. Although in (|^ we 
assumed identical Poisson processes at all nodes, we have 
the flexibility to assign different waiting times at each node. 
An interesting choice is to consider that the waiting time at 
each node is inversely proportional to its degree, i.e., the 
walker spends less time on nodes of high degree. Using 
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an inhomogeneous rescaling of time this leads to a Markov 
process governed by the combinatorial Laplacian: 


^D-^d) = -pD-^ [D-A] 


dp 

dt 


/ 7\ P -^5 

{d) 


(14) 


where {d) = {1^D1)/N is the average degree and 

p = pD~^. The stationary distribution of (d is now the 
uniform distribution over the nodes: 


TTc 


l^/N, 


(15) 


and the combinatorial continuous-time Markov Stability is: 


rcomb(f;-ff) = Tr \h^ H 

The corresponding linearized version is then: 


(16) 


ri“„b(f;F) = GS,^-fCut (17) 

= (l-f)GS^,+fQER. (18) 


In this case, the stationary distribution tTc leads to a different 
diversity index: 


GS^, = 1 - Y,{l^hc/Nf = l-Y. i^c/Nf , (19) 

C=1 C=1 


where nc is the number of nodes of community C. The 
modularity associated with this process is: 


Qer — — Cut — Tr 





( 20 ) 


which is precisely the modularity based on the Erdos-Renyi 
(ER) null model with a probabilistic description given by 
the outer product This version of modularity was 

originally discussed by Newman IH, fSll and has been 
recently studied against network benchmarks 1481 . Based on 
our arguments above, the combinatorial Markov Stability op¬ 
timizes partitions that balance the Cut against the diversity tTc, 
which ignores degrees and counts only the fraction of nodes 
present in each community. Hence, it is more likely to group 
nodes with high degree in the same community when using 
combinatorial Markov Stability, as we will discuss below. 

Einally, we remark that at long Markov time scales, the com¬ 
binatorial Laplacian dynamics recovers the bipartition based 
on the classic heuristic of the signs of the components of the 
Eiedler eigenvector 1411 . which constitutes the basis of several 
spectral algorithms. As stated above, the normalized Laplacian 
version converges to the bipartition based on the normalized 
Eiedler eigenvector, which is also used in other spectral 
algorithms like Shi-Malik 1^ . Seeing those algorithms as the 
coarser extreme of a range of community detection problems 
provides additional insight into the meaning and differences 
between those popular spectral algorithms. 
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Fig. 2. Dynamical coherence in synchronization and 
community structure. We computed the coherence of Ku- 
ramoto oscillators in this toy network and represented it in the 
bottom panel by using a colour code, from black to red as the 
coherence grows. The lower triangle is always more coherent 
than the upper triangle. The partitions obtained by optimizing the 
combinatorial Markov Stability rcowbit^H), related to the Erdos- 
Renyi null model, capture this behavior. On the other hand, 
the optimization of the normalized Markov Stability rnom{t;H), 
related to the usual configuration model, does not find the 
relevant sequence of partitions. 


3.3 Normalized vs Combinatorial Markov Stability: 
some examples 

The relevance of dynamical coherence 

As discussed above, a driving force in the definition of 
quality functions for community detection has been the use of 
null models, i.e., random graph models that preserve certain 
properties of the graph under study and act as bootstraps to 
establish the significance of communities. Early on, it was 
proposed a, Ell that the configuration model should be 
preferable to Erdos-Renyi as the null model, because the 
former takes into account the degree heterogeneity typically 
found in realistic networks. However, it has been recently 
shown 1481 that the Erdos-Renyi model behaves at least as well 
as the configuration modularity on benchmarks 14^ and leads 
to improved results in particular graphs. 

Under our dynamical framework, the two null models cor¬ 
respond to the stationary distributions of the Markov processes 
governed by the normalized and combinatorial Laplacians. 
The two Laplacian dynamics can emerge naturally in the 
modelling of different continuous-time dynamics on networks, 
such as heat diffusion ISOl . M, the linearization of Kuramoto 
oscillators cni, na, or consensus dynamics ED, EH, ED. 
In the important cases when the dynamics of the system is 
governed by the combinatorial Laplacian (e.g., synchroniza¬ 
tion, consensus, or vibrational dynamics), we expect that the 
relevant dynamical groupings should correspond to commu¬ 
nities obtained using the combinatorial version of Markov 
Stability (i.e., corresponding to the ER null model) and not 
the canonical configuration model. 
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Figure illustrates this point by examining relevance of 
dynamic communities in synchronization dynamics on a toy 
network made of two triangles: links in the upper triangle 
have weight 5; links in the lower triangle have weight 25; 
and they are connected by links of weight 1. The dynamics of 
the network is given by the Kuramoto model with uniform 
frequencies, a prototypal model for synchronization where 
each node has a phase (j)i evolving as 

= w + Aij sm{4>j - 4>i). (21) 

3 

The coherence between nodes i and j is measured by the 
order parameter pij{t) = (cos(0i(t) — (j)j{t)))ic, where the 
average is performed over an ensemble of random initial 
conditions. The coherence pij{t) computed from simulations 
(bottom panel) shows that the lower triangle is always more 
coherent than the upper triangle, as expected. If we threshold 
to find coherent clusters Ga, the first group detected is 
the lower triangle, followed by the upper triangle at later 
times. If we use the combinatorial Markov Stability on this 
toy graph, this sequence of partitions is correctly uncovered. 
This follows unsurprisingly from our dynamical interpretation 
since the linearization of the Kuramoto dynamics leads to the 
combinatorial Laplacian. In contrast, rnorm(^) does not recover 
this result, as it first uncovers a dynamically irrelevant partition 
where the upper triangle is found. Interestingly, numerics on 
Kuramoto dynamics ifToi . Ea have shown that the ‘usual’ 
modularity Qconf is only optimized for near-regular graphs, 
i.e., when it is equivalent to the true optimization performed 
by the dynamics, Qer- Therefore, if we are interested in 
coherent Kuramoto communities (e.g., motivated by power 
grid applications (541), the partitions found with the ‘usual’ 
modularity could be misleading. On the other hand, if we are 
interested in the study of probabilistic diffusive dynamics, the 
relevant communities should follow from the study of rnorm(^)- 



Normalized Combinatorial 


Fig. 3. Different random walks, different community 
structure: the C. Elegans neural network. The choice of 
Laplacian dynamics leads to different communities in this real- 
life example. Here we present the partitions at t = 7.8 that 
optimize rnorm (left) and rcomb (right) consisting mainly of 3 large 
communities in both cases (indicated by different colors). The 
nodes are displayed along the vertical axis according to their 
degree centrality. The normalized Laplacian Markov Stability 
biases towards equicentral communities thus leading to a sepa¬ 
ration of high degree nodes into different communities, whereas 
high degree nodes can be grouped within the same community 
for the combinatorial Laplacian version. 


An optimization perspective: distinct cost functions 

Further insight into the communities for each version of 
Markov Stability can be gained by examining the role of the 
stationary distribution of the Markov process in the definition 
of the diversity index appearing in the compound cost function 
to be optimized. From the definitions Q and ( p^ of the di¬ 
versity indices GSt^ and (associated with the normalized 
and combinatorial versions of Markov Stability, respectively), 
it follows that the normalized version balances communities 
with respect to their edge volume while the combinatorial 
version balances communities with respect to their node vol¬ 
ume. Therefore, the normalized version (related to the ‘usual’ 
modularity) tends to separate nodes with high degree into 
different communities. This may lead to unexpected results, 
e.g., in assortative networks, where high degree nodes tend to 
be strongly connected to one another, yet could be split when 
using quality functions based on the configuration model. 

To illustrate this point, consider the community structure 
uncovered in the symmetrized version of the C. elegans neural 
network, a weighted network with 297 nodes and 2m = 17598 
edges. The partitions found by the combinatorial and normal¬ 
ized versions of Markov Stability are significantly different— 
not unexpectedly since the graph is far from being degree- 
homogeneous. In Fig. we present the partitions at t = 7.8 
for both versions consisting of mainly 3 large communities. 
As discussed, the optimization of rnorm (^) tends to balance the 
total degree communities C, while rcomb (^) 

tends to balance the number of nodes nc of the communities. 
Indeed, for the combinatorial Laplacian, the total degree 
of each of the three communities are {1984,11782,3424}, 
whereas these numbers are more balanced for the normalized 
Laplacian: {5753,5561,6284}. On the other hand, the fact 
that the combinatorial Markov Stability does not penalize as 
much grouping together nodes with high degree into the same 
community can also be seen in Fig. The high degree nodes 
tend to be split evenly among the three communities for the 
normalized Laplacian, while the combinatorial Laplacian has a 
disproportionately large number of high degree nodes grouped 
together in the red community, less so in the green community 
and even fewer in the blue community. More specifically, the 
top 20 nodes with the highest degree are distributed among 
the three communities in the ratios {18, 2, 0} for rcomb while 
the corresponding ratios for rnorm are {13,5,2}. 


3.4 The simple random walk and its continuous-time 
versions: degree as centrality 

Our discussion above leads to the following generalization of 
the continuous-time versions of the simple (unbiased) random 
walk. When taking the continuum limit, the waiting times at 
each node can be weighted by any power of the degree: 


^D*(cr'') = -pL)'' D-'^[I - D-'^A] 

dp _ 1 

^ dt~ 


( 22 ) 


where the notation (...) denotes the average over all the nodes, 
i.e., {d~^) = {1^D~^l)/N, and we have introduced the k- 
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scaled Laplacian: 

Lk = D-'‘[I-D-^A]. 

The stationary distribution of ( |22l > is then 

TTfc = 1^Z)''+V (I'^D^+^l) , 
and the corresponding /c-scaled Markov Stability is: 

= Tr -TT^^TTfe) H 

The linearized version reads: 






= rs f 

(#+i) (d-fe) 


Cut, 


(23) 

(24) 

(25) 


H 


(26) 


and, again, the diversity index of the partition is measured as 
a function of the stationary distribution 


GS,, =1-1] {i^D^+^hc! {1^D^+H)y 


(27) 


C=1 


Clearly, /c = 0 corresponds to making the waiting time inde¬ 
pendent of the degree and leads to the normalized Laplacian 
Markov Stability, while k = —1 corresponds to making the 
waiting time inversely proportional to the degree and produces 
the combinatorial Laplacian version. 

This generalization allows us the flexibility to modulate the 
effect of degree centrality in community detection using other 
continuous-time dynamics. We could consider a model where 
the waiting time is proportional to the degree, i.e., k = 1. This 
could be interpreted as the model of a random web surfer, 
spending on average more time reading a page with higher 
number of links. The community detection on such a system 
would then be based on the non-standard Laplacian Li = 
D~^ — D~^ A and the diversity index ( [27] ) will try and balance 
communities according to the square of the degree, making it 
even more unlikely to group high degree nodes in the same 
community. If, on the contrary, we consider a model where 
the waiting times have an inverse square dependence on the 
degree (k = —2), the diversity index ( [27] ) would then be based 
on the inverse of the degree, and the community detection 
will tend to push neighboring high degree nodes together in a 
single community, while low degree nodes stand separated, as 
in a core-periphery decomposition. This phenomenon will be 
more acute as we make k more negative, whereas, conversely, 
a large and positive k will put the emphasis on separating the 
few top degree nodes, disregarding almost entirely the effect 
of the majority of nodes. 

This extended discussion of the simple random walk and 
associated Markov processes highlights the connection of 
dynamical community detection with concepts of centrality. 
Measures of centrality aim at rating how connected nodes 
are with the rest of the network. The weighted degree is 
perhaps the most elementary concept of centrality—indeed, it 
is sometimes referred as ‘degree centrality’. As shown above, 
the degree appears as the stationary distribution of the simple 
random walk Q, and the optimization of the quality function 
for community detection balances the partitions according to 


the diversity of degree centrality. In particular, it is optimal to 
split apart highly central nodes (i.e., with high degree in this 
case) into different communities for short enough Markov time 
scales, and to aim towards balanced intra-community edge 
centrality. The continuous-time versions are able to modulate, 
amplify, attenuate, cancel or even invert the effect of degree 
centrality as the power k is varied. We consider the connection 
of dynamical community detection with other measures of 
centrality in the following section. 

4 Community detection based on other 
NOTIONS OF centrality: the Ruelle- 
Bowen random walk 

4.1 The role of centrality in community detection 

In different applications, it might be desirable to employ other 
measures of centrality as the linchpin for community detection. 
We can achieve this using the random-walk framework dis¬ 
cussed above. Many discrete-time random walks other than the 
simple random walk may be performed on a network. We then 
may think of the stationary distribution of every random walk 
as a centrality measure. Every random walk with transition 
matrix M will then be associated with a dynamical Markov 
Stability quality function, and the corresponding community 
detection will produce optimized partitions which are balanced 
according to different measures of centrality. A generic way to 
generate random walks is to bias the simple random walk ll55l . 
For instance, one may attribute a positive number bi to every 
node i (e.g., a property related to a measure of centrality) 
and let a random walker at i jump to j with probability 
proportional to hiAijhj. 

Once the discrete-time random walk (and its associated 
centrality) is chosen, different continuous-time processes can 
be obtained. Generically, this is done by combining two 
ingredients: the transition probabilities of the discrete-time 
random walk (i.e., the row-stochastic matrix M) and the 
waiting times of the continuous-time process at each node 
(compiled in a node vector w). The resulting process is then: 

^ = -pW-yi-M) (28) 

with W = diag(u;). These two ingredients come into play 
differently in determining the corresponding Markov Stability 
function for community detection. The discrete-time random 
walk deflned by M determines the stationary distribution TTdisc 
on nodes. On the other hand, the continuous-time station¬ 
ary distribution on node i, or node centrality, is given by 
^i'7rdisc,i/(^)5 where {w) is the normalization constant TTdisc^- 
As shown in the examples above, the choice of waiting times 
can thus modulate the effect of the node centralities. The 
centrality of edge ij, on the other hand, is the probability that 
an observed transition links i to j, which does not depend 
on the time elapsed between transition but rather on the 
respective frequencies of transitions given by 7rdisc,i^ij- Edge 
centralities are therefore given by Ildisc^, hence completely 
determined by the discrete-time transitions and unaffected by 
waiting times. As a result, the discrete-time transitions and 
waiting times have a different effect on the resulting Markov 








Stability function: waiting times have no influence on the edge 
centrality but afford complete control over the node centrality 
(and on the Gini-Simpson term of the cost function), whereas 
the Cut term is completely determined by the edge centralities 
(i.e., the underlying discrete-time random walk). At long times, 
the optimal split is provided by the sign pattern of the second 
eigenvector of the ‘generalized Laplacian’ W~^{I — M), 
which depends both on the discrete-time transitions M and 
the waiting times W. We now explore a classic discrete-time 
random walk with distinctive properties. 

4.2 Community detection according to the Rueile- 
Bowen random waik 

A particularly interesting example is the random walk intro¬ 
duced by Ruelle, Bowen and others |[56l . Consider a graph 
with adjacency matrix A = under the usual assumptions 
of connected, undirected, and non-bipartite, for simplicity. 
An important notion of centrality is associated with v, the 
dominant eigenvector of A (i.e., the eigenvector with the 
largest eigenvalue): 


Av = Aiv. 


(29) 


The eigencentrality HSll of node i is given by Vi, its corre¬ 
spondent component of this eigenvector. 

The discrete-time Ruelle-Bowen (RB) random walk is de- 
flned such that the transition between nodes i and j occurs 
with probability ViAijVj: 


Pt+i = Pt 



AAv 


— Pt ^RB, 


(30) 


with p the lx N probability vector and Av = diag(v). Under 
such assumptions, the unique stationary distribution of the RB 
random walk is 


= (31) 

since (l^Ajl) = v^v = 1 for the normalized eigenvector. 
The stationary distribution ttrb can be seen as a centrality mea¬ 
sure, which is called entropy rank (for the unweighted case) or 
free energy rank (for the weighted case) 1581 . thus essentially 
equivalent to eigencentrality in terms of ranking (although the 
concepts diverge in the directed case, not analyzed here). 

This classic random walk has an interesting interpretation in 
terms of entropy: it is maximally exploratory in the sense that 
its per-step entropy is maximal. More precisely, let h denote 
the (Kolmogorov-Sinai) entropy rate of the random walk, 
which is the average per-step entropy that is asymptotically 
approached for long paths, and let E be the expectation of the 
edge transition energies Eij, such that Aij = exp(T^^j). Then 
the RB random walk maximizes the ‘free energy’ h -h E. It 
therefore tends to make all paths of same length equiprobable, 
with a bias to make high energy paths more probable 15^ . 
Beyond its thermodynamic properties, the Ruelle-Bowen walk 
naturally emerges in other contexts, such as the computation 
of quasi-stationary distributions 1^ . 1^ . 

Similarly to the simple random walk, we can associate 
continuous-time Markov processes to the RB random walk. 


The simplest is given by the homogeneous waiting times: 


^ = _p [I-Mrb] 


( 32 ) 


with Mrb as in ( [^ . The node stationary distribution of ( 
is given by ( [3T] ), whereas the edge centralities are given by 
the matrix AJMrb = AvAAv/Ai. The full and linearized 
versions of the RB Markov Stability follow closely the ex¬ 
pressions in ([Tg-Q. This continuous-time process can be 
generalized through the choice of waiting times. 

The RB Markov Stability has connections with other heuris¬ 
tics in the literature. For instance, the spectral algorithm 
associated with the RB random walk on an undirected graph 
makes use of the second eigenvector of the adjacency matrix 
A, similarly to the ‘adjacency spectral clustering’ of Sussman 
et al. |[^ . To illustrate the flexibility of the framework in 
designing cost functions associated to different notions of 
communities, let us consider waiting times W = DA~‘^. This 
choice makes thenode centralities proportional to the degree, 
since the discrete-time RB walk induces stationary probability 
on nodes proportional to A^ (see Eq. (31), while the edge 


centralities, unaffected by waiting times, are still determined 
by the edge entropy rank. The linearized Markov Stability 
optimization will now look for communities balanced in terms 
of number of edges (through diversity term) while cutting 
edges with low entropy rank (through the Cut term). 

As a simple example of the impact of such a choice on 
the outcome of partitioning, consider the graph A — B — C 
composed of two A-cliques A and B and a A-cycle C, 
interconnected by single edges. From the point of view of 
the simple random walk Markov Stability, cutting the A — 5 
edge or the B — C edge is indifferent as far as the cut term 
is concerned. However, RB Markov Stability favours cutting 
the less central B — C edge, thus isolating first the ‘hollow’ 
module C on the account of cut minimization, while the Gini- 
Simpson term tends in this case to keep apart high-degree 
nodes, thus inducing non-trivial results isi. This priming of 
eigencentrality in the allocation of community splits could 
be desirable for particular applications, e.g., when analyzing 
networks with highly heterogeneous eigencentrality across 
the nodes. This will be particularly important in networks 
whose node eigencentrality is not fully captured by the degree 
centrality 1621 . e.g., when a low-degree individual is connected 
to high degree others or in which a high-degree node is only 
connected to low degree others. 

Finally, an interesting property of the Ruelle-Bowen random 
walk is its universality. Any linear dynamics Xt+i = Xt A, 
where Xt is a row vector of real entries over the nodes and A is 
a nonnegative primitive matrix, can be transformed to make it 
interpretable as a random walk 1631 . Hence, besides consensus, 
heat diffusion, linearized synchronization, etc, random walks 
can also be used to represent a wider class of dynamics on 
networks. 


5 Markov Stability for Directed 
Graphs 

Another advantage of the dynamical framework for commu¬ 
nity detection introduced above is that it extends naturally to 
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directed graphs, whereas the extension of structural quality 
functions, such as modularity, to the case of directed graphs 
is not trivial. For instance, although it has been argued 1641 . 
1651 that the null configuration model in modularity should 
become (iind^t/2m in order to account for the directionality 
of the links, this choice and justification of the null model for 
directed graphs is not unique. Under our dynamical viewpoint, 
the notion of community becomes that of fiow community, 
and the relevant centrality is pagerank with its associated null 
model, as we show below. 

Consider the simple random walk for a directed graph wit 
the (non-symmetric) adjacency matrix: A ^ . Each node 

has an in-degree, collected in the vector din = and an 

out-degree, collected in the vector dout = Al, i.e., the sum 
of the weights of the edges directed at and departing from 
the node, respectively. The simple random walk in this case 
is given by 

Pt+l = Pt = Pt -^dir (33) 

where Dout = diag((iout) and For nodes where 

dout.i = 0, we set Dout{i,i) = 1. 

For simplicity, consider first the case when the graph is 
strongly connected and aperiodic. Then the random walk ( [^ 
is ergodic and has a unique, stationary distribution TTdir cor¬ 
responding to the dominant left eigenvector of The 

stationary distribution TTdir is called pagerank, a key measure 
of centrality in directed graphs 1661 . We can then define the 
directed Markov Stability based on the random walk ( [3^ , 
which has the same form as Q and This quality function 
can be used the same way as the undirected version to extract 
multiscale structure in graphs by using the Markov time t as 
a resolution parameter. The directed Markov Stability at time 
t = 1 which, following I© above, corresponds to our quality 
function most closely related to ‘directed modularity’ : 

= Tr [D^(ndirD-iA - 7r^,7rdir)D] • (34) 

Note that the null model we obtained here corresponds to the 
outer product of the normalized pagerank vector in 

lieu of in- and/or out-degree vectors M, (SI. 

Clearly, using gives different results to structural ver¬ 
sions of directed modularity based on in- and out-degree null 
models. While optimization of ( [34| ) favours partitions with per¬ 
sistent fiows of probability within modules, modularity favours 
partitions with high densities of links and is blind to the fiow 
actually taking place on these links. To illustrate the difference, 
consider the toy example given by CD (Fig. 1^, on which 
the directed random walk is ergodic. In this case, optimizing 
the in/out-degree modularity of this toy network leads to a 
partition where heavily weighted links are concentrated inside 
communities, as expected. On the other hand, optimization of 
directed Markov Stability leads to a partition where fiows are 
trapped within modules. It is also interesting to stress that the 
partition that optimizes ( [34| ) also optimizes the map equation 
proposed by Rosvall and Bergstrom lfT^ . For an independent 
study of directed modularity based on other arguments, see 
Kim et al 1671. 

Our definition of directed Markov Stability relies on the 
condition that the dynamics is ergodic. When the directed 





Fdir,! = 0.42 





rdir,l = 0.33 


Fig. 4. Directed Markov Stability versus extensions of 
modularity. In this toy network \T^, the weight of the bold 
links is twice the weight of the other links. The partition on 
the left (indicated by different colors) optimizes directed Markov 
Stability ([^, which intrinsically contains the pagerank as a null 
model. The partition on the right instead optimizes an extension 
of modularity based on in- and out-degrees |63, (65]. Hence 
directed Markov Stability produces flow communities, whereas 
the extension of modularity ignores the effect of flows. 


network is not ergodic, it is common to generalize the standard 
random walk by incorporating a random teleportation term 
(also known as ‘Google teleportation’). If the walker is located 
on a node with at least one outlink, it follows one of those out- 
links with probability r G (0,1). Otherwise, with probability 
1 — r, the random walker teleports with a uniform probability 
to a random node. Instead of M^ir, the new transition matrix 
of the random walk ( [33] ) becomes: 

Matir) = rMdir + [(1 -t)I + t diag(a)] (35) 

where the A/ x 1 vector a is an indicator for dangling nodes: 

= 1 if (iout,i = 0 (and the corresponding row of M^ir is 
assumed to be zero) and = 0 otherwise. Upon visiting a 
dangling node, a random walker is teleported with probability 
1. It is customary to use the value r = 0.85. The teleportation 
scheme is known to make the dynamics ergodic and to ensure 
the existence of a single stationary solution TTdir(r) that is an 
attractor of the dynamics. Indeed, teleportation is sometimes 
introduced even in the ergodic case to improve the numerical 
convergence of pagerank computation. 

Finally, we remark that, as for the undirected case, there 
are continuous-time versions of directed Markov Stability. The 
simplest is given by the corresponding Kolmogorov equation: 

^ = -p [J - Mdir(T)], (36) 

and our discussion above applies to these processes too. An 
application to a large graph of airport connections is presented 
in the next section. See also 1(6^ for an application to social 
network analysis. 

6 Computational methodology and 

PRACTICAL CONSIDERATIONS 

Given a network, and based on modelling considerations or 
other assumptions, we can choose a discrete- or continuous¬ 
time Markov process to scan dynamically the structure of the 
graph at all scales. As shown in the toy example of Figure 
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the optimization of the chosen Markov Stability across time 
leads to a sequence of partitions that are optimal at different 
time scales. The extraction of these optimized partitions is the 
first step to uncover the multi-scale modular structure of the 
network (if present), but the practical application of the method 
still involves at least two non-trivial steps, which we now 
discuss in conjunction with several larger examples. Although 
the examples in this section exhibit a relatively hierarchical 
community structure, in Supp.Inf. we illustrate and measure 
quantitatively non-hierarchical multi-scale structures. 

6.1 Optimization of Markov Stabiiity 

Although it has been shown that modularity optimization is 
NP-hard |[6^ , several heuristic algorithms have been proposed 
to provide satisfactory solutions, in the sense that they ef¬ 
ficiently recover planted solutions in benchmark graphs, or 
that they can uncover groups that are clearly meaningful (e.g. 
classes in a school social network, etc) B- It has also been 
shown that in real-world examples the modularity landscape 
over partitions tends to exhibit large rugged plateaux, making 
it possible to find an approximately optimal partition fTOll . 

We will now show that it is always possible to rewrite 
the Markov Stability for any choice of random walk as 
the modularity of another symmetric graph. This observation 
has important practical implications, as it makes it possible 
to use any modularity-maximization algorithm, e.g. spectral 
or greedy, for the optimization of any version of Markov 
Stability. For example, consider the discrete-time stability 
r{t) = Tr — 7r^7r)i^], for transition matrix M 

and the corresponding centrality tt. It is easy to see that this 
is the usual modularity for the graph of weighted adjacency 
matrix A = (IIM^ + (nM^)^)/2, a symmetric matrix of 
degree sequence A1 = tt^. A similar observation holds in 
continuous time (where the exponential can be evaluated by 
Fade approximations), and also for the linearized versions of 
Markov Stability. 

Any modularity maximization algorithm can therefore be 
used for Markov Stability optimization. As some of those 
algorithms CD are empirically known to run in 0(m log m) 
on m-edge graphs, the most expensive step turns out to be 
matrix multiplication or computation of the exponential, which 
limits the application of full Markov Stability to graphs with 
N ^ 20000 nodes. These overheard costs are spared when us¬ 
ing the linearized version of Stability, which becomes the most 
suitable for the multi-scale analysis of very large networks 
N > 10^. In our applications below, we have used mainly the 
Louvain algorithm fTlI adapted to the optimization of Markov 
Stabilit}0 although spectral bisection methods fT^ for the 
generation of optimized partitions yields good results ifT^ . 

6.2 Robustness of partitions 

Once the sequence of optimized partitions is obtained, we 
need to select the most relevant scales (partitions) for our de¬ 
scription. This is a well-known challenge for multi-resolution 

2. An efficient code, also with a Matlab interface, can be down¬ 
loaded at http://wwwf.imperiaLac.uk/~mpbara/Partition_Stability/ or http:// 
michaelschaub.github.io/PartitionStability/ 



Fig. 5. Selecting robust partitions in the sequence of 
optimized partitions across Markov time, (a) The American 
football network (4] composed of A = 115 teams is known to 
be organized into 12 divisions. (Left) The block structure of the 
normalized variation of information ([^ between the optimized 
partitions at time t and t' and a long plateau in the number of 
communities indicates that the most persistent partition is made 
of 12 communities, as expected. (Right) The randomized version 
of the network, where links have been reshuffled while preserv¬ 
ing the node degrees, does not exhibit robust communities, (b) A 
benchmark hierarchical random network consisting of A = 640 
nodes with 3 levels: 64 modules of 10 nodes; 16 modules of 
40 nodes; 4 modules of 160 nodes [8]. We use one realization 
of the benchmark. Similarly to (a), the long plateaux in the 
number of communities and the block structure with low values 
o^V(V(t),V{t')) reveal the three levels of the hierarchy (left). No 
significant community structure is detected in the randomized 
network (right). Both sequences of partitions were obtained 
optimizing rnorm(t; H) with the Louvain algorithm [TTI . 


methods. Notions of robustness are usually considered when 
dealing with NP-hard optimizations to reflect the ruggedness 
of the landscape of the quality function to be optimized cni. 
In our approach, we establish the significance of a particular 
partition based on its robustness in three different ways ca, 
ca, ca, ca, ca, cu; (i) robust (persistent) across time; 
(ii) robust to small perturbations to the graph; and (iii) robust 
to the optimization algorithm and the starting point of the 
optimization. We now exemplify (i) and (iii). 

The basic notion is to evaluate the effect of these perturbing 
factors on the optimized partition: a partition is robust if 
such perturbations have little effect on the outcome and the 
perturbed result remains close to the unperturbed one. A 
popular way to compare two partitions Vi and V 2 is the 
normalized variation of information EH 


V{VuV2) = 


H{Vi\V2) ^ H{V2\Vi) 
log A 


(37) 


where H{Vi\P 2 ) is the conditional entropy of the partition Vi 
given V 2 , i.e., the additional information needed to describe 
Vi once V 2 is known assuming a uniform probability on the 
nodes. The conditional entropy is defined in the standard way 
for the joint distribution P(Ci,C 2 ) that a node belongs to 
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a) 




Fig. 6. Hierarchical benchmark and statistical tests. Bench¬ 
mark random network with N = 2560 nodes and 4 hierarchical 
levels (with modules of 10, 40, 160 and 640 nodes) 0. (a) 
The long plateaux in the number of communities (blue) and 
the dips in the normalized variation of information across time 
VV{\t)) (green) signal that the four levels of the hierarchy 
have been detected, (b) Same as (a) for a randomized version of 
the network preserving node degrees: no community structure 
is found at any scale. In both cases, A = 20/19 and the 
sequences of partitions were obtained optimizing rnorm(t;iT) 
with the Louvain algorithm. 



Fig. 7. Finding robust communities at multiple scales in 
the atomic network of hemoglobin, a protein tetramer. 
The atomic network of the protein is generated as detailed in 
Refs. IZll, [81] using physico-chemical potentials and atomic 
X-ray crystallographic data (PDB file: 1GZX). This weighted, 
undirected network has N = 8757 nodes (atoms) and 12813 
edges (bonds). The multi-scale nature of our method reveals 
relevant communities across scales, from small chemical group¬ 
ings to large-scale conformations, signalled by dips of the nor¬ 
malized variation of information. These dips deviate significantly 
from chemically-consistent randomized versions of the network 
(not shown; see [74l, [M]). Note the long plateau and dip of 
the normalized variation of information for the 4-way partition, 
corresponding to the identification of the four monomers in the 
hemoglobin tetramer. Here the combinatorial version of Markov 
Stability rcomb(t; H) was optimized, as it is more closely matched 
to the vibrational dynamics of the protein network. 


a community Ci of Vi and to a community C 2 of V 2 - The 
normalized variation of information V{Vi^V 2 ) ^ [0,1] has 
been shown to be a true metric on the space of partitions and 
vanishes only when the two partitions are identical. 

Within the Markov Stability framework, we use this metric 
to evaluate the persistence of partitions across time. By looking 
for block-diagonal regions with low values of V{V{t),V{t')), 
as well as plateaux in the number of communities as a 
function of time 1^ . we can detect the relevant partitions and 
scales without assuming them a priori. Two examples of this 
approach are shown in Fig. where we illustrate the detection 
of the relevant scale (12 communities) in a small real-life 
network of American football teams (N = 115), as well as 
three scales in a hierarchical benchmark random network with 
N = 640 nodes. The same notion is evaluated in Fig.|^ where 
we detect 4 hierarchical levels in a larger benchmark network 
with N = 2560 nodes by comparing partitions across time 
using the scaling factor A to evaluate V{V{t),V{Xt)). 

In addition to the robustness of partitions based on persis¬ 
tence across time, it is also helpful to evaluate the robustness 
of the solution with respect to the optimization. We do this 
by repeating the Louvain optimization many times (in excess 
of 100 random initial seeds for each Markov time) and 
evaluating the average normalized variation of information 
within the ensemble of optimized solutions. If a partition is 
robust to the optimization, we expect a small value (or a dip) 
in the normalized variation of information of the ensemble 
of optimized solutions, signaling a relevant partition. This 


robustness to the optimization probes the ruggedness of the 
landscape and can be tested for different optimization algo¬ 
rithms IITOI . Here we use the Louvain algorithmic heuristic, 
which has been shown to perform well both in benchmarks 
and real-life examples fTSll . In Figures |7] and we show 
the application of this approach to two large networks: an 
undirected, weighted atomic protein network with N = 8757 
nodes; and a directed, weighted network of airport connections 
with N = 2905 nodes. In both networks, we find relevant 
structure at different resolutions. Of note is that our results 
in the protein network are able to identify partitions corre¬ 
sponding to relevant chemical structures (involving only a few 
nodes), through secondary structures such as helices (involving 
several hundreds of atoms) to large conformational domains 
and, importantly, the subunits (involving several thousands of 
atoms). In the case of the airport network, the different levels 
of resolution reveal geographical and socio-political groupings. 
In this case, the directed character of Markov Stability is 
able to reveal communities with specific flow characteristics, 
including regions with focalized entry points coupled to a local 
asymmetric distribution network (e.g., Alaska and Greenland). 

The selection of the relevant scales is still an open area 
of research in multiscale community methods and has strong 
links with non-convex optimization. Our notions of robustness 
reveal that the optimized partitions found at peaks of the 
variation of information tend to be hybrid combinations of 
natural partitions with non-uniform resolution, splitting some 
but not all the coarser communities, thereby explaining a 
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Fig. 8. Flow communities at multiple scales in an airport network. The airport network contains N = 2905 nodes 
(airports) and 30442 weighted directed edges. The weights record the number of flights between airports (i.e., the network does not 
take into account passenger numbers, just the number of connections). Representative partitions at different levels of resolution 
with (b) 44, (c) 18 and (d) 5 communities are presented. The partitions correspond to dips in the normalized variation of information 
in (a) and show persistence across time (see Suppl. Info.). 

high sensitivity to the random seed or Markov time. In other 
cases, such peaks correspond to the coexistence of a few 
‘good’ partitions, which might indicate a tendency to flip 
between such outcomes and, hence, a lack of robustness. In 
this sense, the peaks in the variation of information tend 
to signal the separation between the relevant scales in the 
community structure of the network, and can also be related to 
the existence of non-hierarchical (yet multi-scale) community 
structure (see Supp. Info, for some examples). These topics 
will be the object of further work. 

7 Discussion 

Our work emphasizes the importance of choosing proper 
dynamical processes in order to uncover information in net¬ 
worked systems. Here, we have focused on random walk 
processes, which are known to be mathematically equivalent to 
a broad range of diffusive processes: heat diffusion, evolution 
on a (free) energy landscape (251, opinion dynamics on 
social networks and other kinds of consensus problems (831 . 

(52l . linearization of synchronization (84l . (3l and power 
networks (54l . among others. Importantly, using the random 
walk corresponding to the natural dynamics of the system 
allows us to find its central nodes (according to its intrinsic 
centrality measure) and to recover dynamically meaningful 
communities, i.e., the communities of nodes that best retain the 
diffusive flow for a certain time scale. If there is no intrinsic 
dynamics in the system, and hence no unique choice for the 


exploratory Markov dynamics, our approach provides tools to 
understand the effect of the different choices of random walks 
and associated centrality measures on the community structure 
obtained through Markov Stability optimization. 

More generally, our approach provides a unified viewpoint 
for a number of existing approaches, as summarized on Fig.|^ 
and Our approach paves the way for the development of 
metrics and algorithms that exploit real-world non-Markovian 
random walks (86l or incorporate non-trivial temporal patterns 
into diffusive models (87]| . Our work also opens perspectives in 
community detection by providing a dynamical interpretation 
of quality functions, and by interpreting the standard null- 
model paradigm in terms of stationary distributions na, a. 
The dynamical approach that we advocate here, not only 
generalizes the null model paradigm, but can also lead to 
fundamentally different quality functions. For instance, even 
the simple random walk on a directed graph leads to a Stability 
function containing the pagerank, which is not expressible 
in terms of combinatorial quantities, hence different from 
any null-model-based variant of modularity. The dynamic and 
null-model paradigms do overlap in a number of interesting 
cases. We have shown that for undirected networks, the two 
most common continuous-time dynamics, described by the 
normalized and combinatorial Laplacians, correspond to the 
two most meaningful null models, i.e., the configuration model 
and the Erdos-Renyi model. Through the intuition gained from 
the corresponding dynamics, we reinterpret the Erdos-Renyi 
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Fig. 9. Summary of the dynamics-based Markov Stability framework and connections with centrality measures, and 
other clustering and community detection methods in the literature. 


null model (long considered as inferior in the null-model 
literature |[85]| . 141 ) and show that it is linked to an optimization 
that tends to produce node-balanced communities, and can be 
more relevant under particular dynamical processes, consistent 
with the findings of Traag et al 1481 . The exploration of 
alternative random walks, such as the Ruelle-Bowen walk, also 
highlights the capability of introducing alternative measures of 
centrality and extending community detection to include non¬ 
standard Markov processes. 
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Appendix 


Hierarchical versus multiscale organization 

Our use of time as a resolution parameter enables Markov 
Stability to detect robust partitions at different scales without 
imposing a priori the coarseness of the partitions. Although 
some of the methods used to optimize Markov Stability can 
lead to hierarchical community structure (e.g., the use of 
recursive bipartitions via Shi-Malik uni), we also use opti¬ 
mization heuristics that do not impose such a constraint (e.g., 
the use of the Louvain algorithm IItTII optimized independently 
at each time). It is then interesting to check whether or not 
the sequence of partitions is compatible with a hierarchical 
organization. This problem requires the introduction of a 
quantity that measures whether the communities at time t' are 
nested into the communities at a subsequent time t > t'. A 
well-known information theoretic measure that is particular 
adapted for such a purpose is the normalized conditional 
entropy: 

H{V{t)\V{t')) = (38) 


which is also constrained to the interval [0,1] but is now an 
asymmetric quantity that vanishes only if each community 
of Vt is the union of communities of Vt'- The combined 
knowledge of V and H therefore allows us to uncover the 
significant partitions of the system and to verify if those 
partitions are organized in a hierarchical manner. For instance, 
the benchmark in Fig. is clearly hierarchical, as can be seen 
in Figure 10 1 , whereas the toy network in Fig. 10 3 shows 
that the sequence of the optimal partitions is not necessarily 
hierarchical. 


Consistency of robustness measures in the airport 
network 

As a complement to Fig. Fig. shows that the dips 
in the normalized variation of information of the ensemble 
of solutions (presented in Fig. are consistent with the 
presence of block-structure in the normalized variation of 
information between the optimized solutions found across time 
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Fig. 10. Lack of hierarchy in a toy network. We optimize 
the Markov Stability rnorm(i) of: (a) the hierarchical model 
in Fig. |^, and (b) a toy network (bottom panel) with 6 
nodes and links of different strength (thick links of weight 
5, thin links of weight 1). (a) The normalized variation 
of information V{V{t),V{t')) (left, same as in Fig. 5b), 
indicates the presence of three levels of a hierarchy. 
The conditional entropy H{V{t)\V{t')) (right) reveals that 
the obtained community structure respects a strict hier¬ 
archy, although the Louvain optimization method does 
not impose such a hierarchical structure a priori, (b) For 
the toy network, the normalized variation of information 
V{V{t),'P{t')) and the number of communities (left) reveal 
a sequence of partitions with 6, 4, 3, and 2 communities 
(shown bottom). The 3-way partition is especially robust. 
In this case, however, the sequence of uncovered parti¬ 
tions is not hierarchical since the three-way partition is not 
nested into the two-way partition. This is revealed by the 
conditional entropy H{V{t)\'P{t')) (right): there is a region 
o\t>t' in which H{V{t)\V{t')) > 0. 
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Variation of Information 
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Fig. 11. Consistency of two robustness measures: persistence across time and optimization. This figure complements 
Fig.|^ (Top) Colormap of the normalized variation of information V{V{t),V{t')) for the optimized partitions of the 
airport network across time. The dark blue blocks indicate plateaux of similar partitions (see Fig. and Figure [TO^). 
(Bottom) The normalized variation of information of the ensemble of solutions with respect to the random seed of the 
optimization (same as in Fig.|^). The Markov times delimiting the blocks (top) correspond to peaks of the normalized 
variation of information of the ensemble of solutions (bottom), while the dips fall within the squares. Some of these 
dips have been presented as representative partitions in Fig.|^ 





















